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ROBUSTNESS OF MULTIPLE TESTING PROCEDURES 
AGAINST DEPENDENCE 

By Sandy Clarke and Peter Hall 

University of Melbourne 

An important aspect of multiple hypothesis testing is controlling 
the significance level, or the level of Type I error. When the test 
statistics are not independent it can be particularly challenging to 
deal with this problem, without resorting to very conservative pro- 
cedures. In this paper we show that, in the context of contemporary 
multiple testing problems, where the number of tests is often very 
large, the difficulties caused by dependence are less serious than in 
classical cases. This is particularly true when the null distributions of 
test statistics are relatively light-tailed, for example, when they can 
be based on Normal or Student's t approximations. There, if the test 
statistics can fairly be viewed as being generated by a linear process, 
an analysis founded on the incorrect assumption of independence is 
asymptotically correct as the number of hypotheses diverges. In par- 
ticular, the point process representing the null distribution of the 
indices at which statistically significant test results occur is approx- 
imately Poisson, just as in the case of independence. The Poisson 
process also has the same mean as in the independence case, and of 
course exhibits no clustering of false discoveries. However, this result 
can fail if the null distributions are particularly heavy-tailed. There 
clusters of statistically significant results can occur, even when the 
null hypothesis is correct. We give an intuitive explanation for these 
disparate properties in light- and heavy-tailed cases, and provide rig- 
orous theory underpinning the intuition. 

1. Introduction. Classical properties of simultaneous hypothesis testing, 
error rate and false-discovery rate are well understood. They have been ex- 
plored extensively, in both practice and theory, in the context of independent 
tests. However, for a range of contemporary applications, multiple testing 
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problems differ substantially from the conventional. For instance, the num- 
ber, V say, of tests is often far greater than the number of data, n, in the 
samples from which test statistics are computed. There is also potential for a 
degree of dependence among samples, even though the data within a sample 
can often fairly be assumed to be independent. 

By way of contrast, in classical settings the value of v is relatively small, 
and critical points are only moderately large (equivalently, p-values are only 
modestly small). Here a major, noticeable impact of dependence is that it 
results in clusters of rejections. That is, if a test is rejected for a particular 
value of an index, then there are likely to be further rejections for tests that 
have nearby indices (assuming that index order reflects dependence). This 
can impact significantly on the accuracy of multiple testing procedures. 

One approach to alleviating the difficulties caused by dependence is to 
use techniques based on Bonferroni's inequality. However, such bounds are 
quite conservative, and if they could be avoided, then greater precision would 
result. In some settings, where positive dependence is present, corrections 
of Bonferroni type are unnecessary [see Benjamini and Yekutieli (2001) for 
discussion], but in general the nature of dependence is not known reliably. 
Moreover, even in the case of positive dependence it is of interest to know 
whether the test is genuinely conservative, as indicated by conventional the- 
oretical arguments, or whether its level accuracy is virtually the same as in 
the case of independent data. Efron (2007) has suggested correlation correc- 
tions for large-scale simultaneous hypothesis testing. 

One might expect the same difficulties and questions to arise in contem- 
porary testing problems, where v is much greater than n. (In some of these 
problems, typical values of v and n are 10,000 and 20, resp.) Indeed, there 
is reason to suspect that difficulties could increase with increasing since 
it can be particularly difficult to model accurately the extremes of depen- 
dent data processes. Additionally, inaccuracies become more obvious as the 
amount of information about a model increases. 

However, it turns out that sometimes, although not always, the problem 
is actually simpler in the contemporary, 'V much larger than n" case. For 
example, in cases where test statistics have light-tailed distributions, the 
difficulties caused by dependence tend to retreat as the number of simulta- 
neous tests increases. The number of clusters of false discoveries declines, 
and the distribution of critical-point exceedences closely resembles its coun- 
terpart for independent data. Only for very heavy-tailed data is this property 
violated; for dependent data, when the distribution of the test statistic is 
light-tailed and the number of simultaneous tests is very large, methods that 
would normally be recommended only for independent data can give good 
control of error rate and false-discovery rate. 

This result can be explained intuitively by noting that, in the case of 
light-tailed marginal distributions, exceedences above a high level occur only 
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because neighboring disturbances are fortuitously aligned. Indeed, since the 
tail is light, then it is highly unlikely that a single disturbance is so great 
as to carry the process close to, or over, the level for several different in- 
dices. Instead different, moderately large disturbances reinforce one another, 
by chance, at a particular index. However, at adjacent indices the circum- 
stances that led to alignment change. As a result the propensity for level 
exceedence quickly diminishes, and even disappears. Consequently, clusters 
of exceedences seldom arise. That is, the pattern of exceedences appears as 
though it was produced by a sequence of independent tests, and as a re- 
sult, both generalized family-wise error rate, and false-discovery rate, can 
be controlled by appealling to standard arguments for independent tests. 

On the other hand, when test statistics have heavy-tailed distributions it 
is possible for a single disturbance to be so great that it carries the value of 
a test statistic over a high level for several indices in a row. In such cases, 
clusters of exceedences occur, and methods based on independent data are 
not adequate for controlling error rates. 

These arguments and properties, especially those in the light-tailed set- 
ting, are applicable only to exceedences of high levels. Very high levels are 
relevant only when the number of simultaneous tests is very large, and so the 
properties tend not to be noticed in conventional multiple testing problems, 
where the number of tests is relatively small. 

In this paper we develop rigorous arguments, using linear-process models 
for test statistics, to capture in theory the ideas discussed above. We show 
that if the test statistic distribution has tails that decay like exp(— Cx"^), for 
constants C, 7 > 0, then the tails can be regarded as "light" (in the context 
of the discussion above) when 7 > 1; they are "heavy" when < 7 < 1. 
However, even in the latter case the problem has many of the characteristics 
of the light-tailed context, unless there are ties among the weights in the 
linear process. Only in very heavy-tailed cases, where the distribution of the 
test statistic decreases at a polynomial rather than exponential rate in the 
tails, are methods based on independent data seen to be inadequate. 

Moreover, even in these heavy-tailed contexts the independent-data ap- 
proach can provide good results for large-but-not-too-large u. A case in point 
is that where the test statistic is a Student's t ratio. There, although the 
extreme tails of the test statistic distribution are typically regularly varying 
(e.g., when the sampling distribution is Gaussian), large-deviation proper- 
ties show that less extreme parts of the tail are well approximated by the 
function exp(— Cx''') for 7 = 2 [see, e.g., Shao (1999) and Wang (2005)]. 
As a result, good performance can be obtained, in the case of dependent 
t-statistics, by arguing as though the data are independent. 

There is a particularly broad and deep literature on multiple testing pro- 
cedures, only a part of it confined to statistics journals. Review-type con- 
tributions include those of Hochberg and Tamhane (1987), who expounded 
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work on multiple comparisons up to the mid-1980s; Pigeot (2000), who sur- 
veyed conceptual issues in multiple testing; Dudoit, Shaffer and Boldrick 
(2003), who reviewed multiple hypothesis testing in microarray settings; 
Bernhard, Klein and Hommel (2004), who discussed literature on global and 
multiple testing; and [Lehmann and Romano (2005), Chapter 9], who dis- 
cussed multiple hypothesis testing in the context of hypothesis testing more 
generally. 

Among contributions related to this paper, Hochberg and Benjamini (1990) 
pointed to the need for procedures that are more powerful than classi- 
cal multiple comparison methods, and suggested new, generally applica- 
ble techniques; Rom (1990) introduced methods based on modified Bon- 
ferroni arguments; Dunnett and Tamhane (1995) discussed step-up meth- 
ods for multiple testing in the presence of correlation; Wright (1992) de- 
veloped p-value adjustments based on Bonferroni's bounds; Benjamini and 
Hochberg (2000, 1995) proposed approaches to false-discovery rate in mul- 
tiple testing; Blair, Troendle and Beck (1996) introduced methods for con- 
trolling family- wise error rates in multiple procedures; Brown and Russell 
(1997) suggested corrections for multiple testing; Olejnik et al. (1997) com- 
pared Bonferroni-type methods; Sarkar and Chang (1997) discussed multi- 
ple testing in the presence of positive dependence; Finner and Roters (1998, 
1999, 2000) gave asymptotic theory for an increasingly large number of 
hypothesis tests, and (2002) discussed the expected number of Type I er- 
rors in multiple testing problems; Holland and Cheung (2002) discussed ro- 
bustness of family-wise error rate; Kesselman, Cribbie and Holland (2002) 
suggested ways of controlling level accuracy over a large number of hy- 
pothesis tests; Genovese and Wasserman (2004) proposed new, stochastic 
process-based methods for controlling false-discovery rate in multiple test- 
ing; Lehmann, Romano and Shaffer (2005) developed optimality theory for 
multiple testing; Rosenberg, Che and Chen (2006) suggested multiple hy- 
pothesis testing methods in a genomic setting; Sarkar (2006) obtained new 
results on false-discovery rates for single-step, multiple testing procedures; 
Schmidt and Stadtmiiller (2006) and Schmidt (2007) discussed upper-tailed 
dependence; and Yekutieli et al. (2006) developed new approaches to the 
treatment of multiplicity in the setting of microarray analysis. 

The issue of overall error rate, as distinct from the error rate of individual 
tests, was taken up by Godfrey (1985), who drew attention to the tendency 
to enhance the significance of treatment effects if the overall error rate is 
not controlled. See also [Smith et al. (1987), Pocock, Hughes and Lee (1987), 
Gotzsche (1989), Ludbrook (1991), Ottenbacher (1991a, 1991b, 1998) and 
Ottenbacher and Barrett (1991)], who discussed Type I error rate, and prob- 
lems with its assessment, in the evaluation of multiplicity in medical-research 
literature. 
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2. Error rate and false-discovery rate. Suppose we conduct v tests, based 
on the respective values of the random variables Xi, . . . ^X^,. Here, typ- 
ically represents a test statistic computed from the ith of a sequence of 
samples. We reject the ith null hypothesis, H^i^ representing, for example, 
the hypothesis that the "center" (e.g., the mean) of the population from 
which the ith sample is drawn equals zero, if Xj > t; if Xi < t, then we do 
not reject Hoi. Let N, a random variable, denote the number of rejected 
hypotheses: 

V 

(2.1) X = Y,l{X,>t). 

If each of Hqx^ ■ ■ ■ , Hq^ is correct, and if we view the sequence of v tests 
as a test of the simultaneous hypothesis -ffo that each of the component 
hypotheses Hqi is true, then the significance level of the simultaneous test 
equals the probability that N >!. This is the family- wise error rate (FWER) 
of the procedure. For example, if < a < 1 and we define j3 = — log(l — a); 
if we choose t, in (2.1), to satisfy 

(2.2) Po{X>t) = - + o{u'^) 
and if 

(2.3) the random variables X^ are independent and identically distributed as X\ 

then the family-wise error rate converges to a as increases: Pq{X > 1) — > a. 
Here and in (2.2), Pq denotes probability computed under Hq. 

The assumption in (2.3) that the test statistics Xi are identically dis- 
tributed can be relaxed without much difficulty. For example, if Xi is a 
Student's t-statistic, then it is permissible for Xi to be based on a sample 
of size Ui drawn from a distribution Fj, both depending on i, provided the 
sample sizes and distributions do not vary too greatly with i. However, the 
assumption of independence in (2.3) is critical to our argument at this point. 

More generally, it is of interest to determine the probability that we make 
at least k false discoveries, that is, Po{N > k), where A; > 1 can be arbitrary. 
This is the generalized family- wise error rate (GFWER). If t satisfies (2.2), 
and if (2.3) holds, then N is asymptotically Poisson-distributed with mean 
P, and so as — oo, 

(2.4) Po(Ar>A;)^^^e-^. 

j=k ^■ 

An alternative, false-discovery rate (FDR) approach, developed by Simes 
(1986), Hommeh (1988), Hochberg (1988) and Benjamini and Hochberg (1995), 
involves a step-down procedure but can be framed in a similar way to 
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GFWER. [See also Sarkar (1998) and Sen (1999).] In particular, for i > 1 
let ti > t2 > • ■ ■ denote a sequence depending on v and with the property, 
analogous to (2.2), that 

(2.5) Po(x>t,) = ^ + o(z.-i). 

[Thus, t in (2.2) is here denoted by t\\ Write Ni for the number of values Xi 
that lie in the interval (tj,ti_i], where we take to = oo- The event that the 
step-down method of Benjamini and Hochberg (1995) does not reject any of 
the hypotheses i?oi) for 1 ^ ^ ^ ^) is equivalent to the event that, for each 
i in the latter range, Xi = X{^y_jj^x) ^ ^j; where < • • • < X^y-^ represent 
the order statistics of the sequence X\^ . . . ^Xy. In particular, if k denotes 
the largest j for which X(y_j-^ < tj-i, then ifoi is rejected for each i such 
that Xi = where 1 < j <k. 

This indicates that, to describe properties of the false-discovery rate ap- 
proach, we need to understand not just the distribution of A^, defined at 
(2.1), but more generally the distribution of 

ivW=^/(X,>tfc). 
1=1 

Note that M''^ = A^i H h iVfc, where 

V 

(2.6) Ni = Y^l{ti<Xj<U.{). 

i=i 

Assuming that both (2.3) and (2.5) hold, the random variables A^i, . . . , A^^ 
are asymptotically independent and Poisson-distributed with mean /3. There- 
fore, the probability that the null hypotheses corresponding to the k largest 
values of Xi are all rejected under the FDR approach, when they are in fact 
all correct, is given by 

(2.7) PfiiN^^ >iioY\<i<k)^ P{Qi + • • • + Qi > i for 1 < i < k), 

where Qi, ■ ■ ■ ,Qk are independent and identically Poisson-distributed with 
mean j3. It can be shown from the lemma of Benjamini and Hochberg (1995), 
page 293, that the probability on the right-hand side of (2.7) is dominated 
by (3, for each k > 1. Of course, this is useful only if /3 < 1. 

In conventional treatments of error rate and false-discovery rate problems, 
the right-hand sides of (2.2) and (2.5) would generally be replaced by 1 — 
(1 — py^'^ and 113/1/, respectively, reflecting an assumption that the null 
distribution of X is known exactly. By way of comparison, (2.2) and (2.5) 
countenance a certain amount of error in our knowledge of the distribution. 

The key approximation properties needed to interpret GFWER and FDR 
in practice are (2.4) and (2.7), which describe the probability of making at 
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least k false discoveries when using the respective methods. In both cases 
the assumption of independence, in (2.3), is crucial; without it the Poisson 
approximations may be poor. Our aim is to explore the extent to which 
the approximations can be rendered invalid by dependence. The context of 
family- wise error rate is relatively transparent, and so we shall pay greatest 
attention to that, although giving explicit results in the setting of false- 
discovery rate. 

3. Conditions under which clustering occurs, or fails to occur. 

3.1. Models for clustering and for the process Xi. If tests of the hy- 
potheses Hqi are conducted independently of one another, then there is no 
evidence of clustering of level exceedences. In particular, if the random vari- 
ables Xi are independent and have infinite upper tails, then, trivially, 

(3.1) for each zq > 1 P{Xi > x for some i with 1 < |i| < io I -'^o > a;) ^ 

as X ^ oo. We shall define (asymptotic) clustering to occur if (3.1) fails. 

Rather than take the Xj's to be independent, we shall model them by a 
moving average: 



where the ^^'s are constants and the random variables £i, for —oo <i < oo, 
are independent and identically distributed. Motivated by simplicity, and by 
the fact that our definition of clustering involves only fixed, finite values of 
if) in (3.1), we shall take the moving average to be of finite order: 

9k = for all but a finite number of values of k, and Ok ^ for some k. 



Of course, all our results can be extended to the setting of infinite-order 
moving averages with sufficiently rapidly decreasing weights 9^, and in par- 
ticular, all of the Sfc's can be nonzero. We confine attention to the finite-order 
case only for convenience. 

The model (3.2) is admittedly rudimentary. However, a more detailed 
treatment, starting from a "time series" model for the data and, through 
that, constructing a model for the statistics Xi, requires specific information 
about the definition of the test statistic. The choice at (3.2) is appropriate if 
the test is being conducted about a mean when the variance is known, and 
in particular if Xi = n~^/'^J2i<j<n^ij^ where 





k 



(3.3) 



(3.4) Vi,=fi^ + Y,9ke',^,^ 



for \<i<v and 1 < j < n 



k 
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fii = EiVij) and the disturbances e[j are all independent and identically 
distributed with zero expected value. Here, (3.2) holds if we take 

(3.5) ei = n-^ ^ e-^-, 

these variables being independent and identically distributed. The null and 
alternative hypotheses under test using the statistic Xi are HQi : /Xj = and 
Hii : fii> 0, respectively. 

3.2. Sufficient conditions for no clustering. We first state a simple, suf- 
ficient condition for (3.1). Let the linear process Xi be as at (3.2), let )Cj 
denote the set of integers k such that 9k- j 7^ 0, and put IC^^^ = ICj n JCq. We 
ask that the independent and identically distributed disturbances satisfy 

(3.6) for each ^ > and each j / ^(£fc6/c(^) 9kek>u-v) ^ ^ 

as u — > cxD. 

Let 1 < Ii < I2 < • ■ ■ denote the indices i for which Xi > t, where t is as in 
(2.2). 

Theorem 3.1. // (3.3) and (3.6) hold, then so too does (3.1). 

Theorem 3.2. // (2.2) and (3.1) hold, then, for each constant C > 0, 
the point process Iiu~^ , l2i'~^ , . . . , restricted to the interval [0, C], converges 
weakly, as u ^ 00, to a homogeneous Poisson process on [0, C], with intensity 
p. 

Theorem 3.2 implies that N, at (2.1), is Poisson-distributed with mean (3. 
The argument leading to Theorem 3.2 also shows that, if (2.5) and (3.1) hold, 
then for each i>l the random variables Ni, . . . , Ni, introduced at (2.6), 
are independent and identically Poisson-distributed with mean /3. Together 
these results establish the correctness of the crucial Poisson approximations 
(2.4) and (2.7). 

As noted in Section 2, these results also hold if Xi,X2,... are indepen- 
dently distributed. Therefore, under conditions (2.5) and (3.1), exceedences 
of the level t by the linear process Xi , X2 , . . . have the same first-order 
asymptotic properties they would enjoy if the XiS were independent and 
identically distributed random variables with the same marginal distribution 
as the linear process. In particular, the Introduction of dependence does not 
produce any first-order evidence of clustering. 

Therefore, calibrating the tests using methodology based on the assump- 
tion of independence is adequate if the null distribution of the stochastic 
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process Xi is close to that of a linear process, if the number of simultane- 
ous tests is sufficiently large, and if (3.1) holds. In the next section we shall 
show that (3.6), and hence (3.1), prevails if the marginal distribution of Xi 
is light-tailed. 

3.3. No clustering occurs for light-tailed distributions. Here we show 
that, under the moving-average model defined at (3.2) and (3.3), no cluster- 
ing occurs [i.e., (3.1) holds] if the distribution tails decrease like exp(— const, x'^) 
where 7 > 1. Therefore, testing can proceed as though the test statistics Xi 
are independent, which of course they are not. 

The case where 7 > 1 is relatively straightforward; there we need assume 
only that, for a constant C > 0, the density / of the distribution of e satisfies, 
as X ^ 00, 

(3.7) fix) = exp{o(2;^)} exp(-C2;T). 

A sufficient condition for (3.7) is the following: For constants C, Ci > and 

C2>0, 

(3.8) fix) ~ Cix^^ exp(-Cx^) 
as X — > 00. 

Theorem 3.3. If the process Xi,X2,--- is determined by (3.2); if the 
density of e exists and satisfies (3.7) with 7 > 1, or satisfies (3.8) with 7 = 1; 
and if the weights 6k are all nonnegative and satisfy (3.3); then (3.6), and 
hence also (3.1), hold. 

The assumption that the weights 9^ are all nonnegative is important, in 
that without it, properties of the lower tail of the distribution of e would 
have to be taken into account. [Conditions (3.7) and (3.8) address only the 
upper tail.] Depending on behavior of the lower tail, if one or more of the 
Sfc's is negative, then first-order asymptotic theory can be quite different 
from that discussed in Theorems 3.3-3.5. 

For example, if the negative ^^'s form a set {9^ = — for k € A}, where 
a; > 0; and if the density of the lower tail of the distribution of e satisfies 

fi-x)r^C^x^^eM-C^x^') 

as X — > 00, where C3, C5 > 0, C4 > and < 71 < 1 < 7; then the pattern 
of exceedences of t [where t still has the property at (2.2)] is first-order 
equivalent to that for quite a different process Xi , for which the only nonzero 
moving-average weights are 9k = oj for k ^ A, and where the distribution of e 
satisfies (3.8) with (C,Ci, 6*2,7) there replaced by (C5, C3, C4, 71). For such 
a process, clustering can occur; see Theorem 3.4 below. Thus, by allowing 
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negative weights and choosing the lower-tail distribution appropriately, we 
can substantially alter the pattern of level exceedences. 

The case of Student's t-statistic is related to the model (3.2), but differs 
in important respects. One of these is the potential for the tails of the 
distribution of Xi to become lighter as the "group size," that is, the size 
of the dataset used to compute an individual Xj, increases. We shall discuss 
this issue in Section 3.6. 

Theorem 3.3 includes the case where the autoregression is a Gaussian 
process. In particular it implies that, in the Gaussian setting, clustering does 
not occur unless, for example, the strength of dependence of the process Xi 
is permitted to increase with v. We shall take up this issue in Section 3.7, 
showing that correlations must converge to 1 at least as fast as (logz^)^"'^ if 
clustering is to be present in asymptotic terms. 

3.4. Clustering can sometimes occur if <'y < 1. The case where (3.7) 
or (3.8) holds, and < 7 < 1, is relatively complex. There, if the largest 0^ 
occurs for a unique value of k, then (3.1) holds. That is, the probability that 
there exists a cluster of exceedences converges to zero as the exceedence 
level, X, increases. In this instance, if is sufficiently large, the dependent 
test statistics Xi can be treated as though they were independent, without 
serious problems arising. 

However, if there are ties for the largest 9^, then the probability of a 
cluster does not converge to zero. In this case, if the number of tied values 
equals q, then the probability that the size of the cluster of exceedences also 
equals q, converges to 1 as the exceedence level increases. 

To indicate why the case 7 < 1 is so different, we treat the instance where 
9i = ■ ■ ■ = 9r and each other 9k vanishes. In this setting, having ei + ■ • • + er > 
X implies that, with high probability, one of the values of ei, . . . ,£r is very 
close to X, or greater than x, and the other values are all significantly smaller 
than X. (Here and below we assume that x is large.) That is, just one of the 
Ei's is responsible for the level exceedence, and its influence can persist, 
through weights in the moving average, to ensure that e^+i + • — h ej+r > x 
for values of j other than simply j = 0. 

By way of comparison, if 7 > 1 and ei H |-£r > x, then it is highly likely 

that this is achieved through all of the e^'s being of order x; and that, if just 
one of the e^'s is exchanged for another, the inequality fails. Therefore in this 
case, \i ei + ■ ■ ■A-Er > X, then it is unlikely that ej+i + • • ■ + £j+r > x for values 
of j 7^ 0. Therefore clustering can occur if 7 < 1, but is relatively unlikely 
if 7 > 1. Our proofs in Section 5 involve verification of general versions of 
these properties, which underpin the intuitive arguments given in Section 1. 

Next we formally state a result describing the case < 7 < 1 . Write r for 
any integer that is not less than the difference between the least, and largest, 
values of k for which 9k 7^ 0, and let M denote the number of values j with 
\j\ < r, for which Xj > x. 



MULTIPLE TESTING 



11 



Theorem 3.4. Assume that (a) the weights 9k are all nonnegative and 
satisfy (3.3), and (b) the density f of the distribution of e exists and satisfies 
( 3. 7) for a value of 7 in the range < 7 < 1 . //, in addition, (c) there is 
no tie for the largest 9^, then (i) (3.1) holds. On the other hand, if (a) 
and (b) hold, although with (3.8) replacing (3.7) in (b) and, instead of 
(c), (d) exactly q > 2 of the values of 9k tie for the maximum, then (ii) 
P{M = q\Xo> x)^l as 00. 

It follows from Theorems 3.2 and 3.4 that if (2.5) and (a)-(c) in Theorem 
3.4 hold, then the random variable N, at (2.1), is asymptotically Poisson 
with mean (3; and likewise, that the random variables A'^i, . . . , N^., defined at 

(2.6) , are asymptotically independent and Poisson with mean (3. This shows 
that, asymptotically, clusters do not occur, and establishes the correctness 
of the key Poisson approximations, (2.4) and (2.7), borrowed from the case 
where the Xj's are independent. 

However, if (c) in Theorem 3.4 fails, and is replaced there by (d), then 
with probability converging to 1, clusters exist and are of size q. Moreover, 
q~^N is asymptotically Poisson, and q~^Ni, . . . ,q~^Nk are asymptotically 
independent and Poisson, with mean P/q in each case. Therefore (2.4) and 

(2.7) fail in this case. For example, (2.7) should be replaced by the result, 

Po(iV(*) > i for 1 < i < yfc) ^ PiqQi + ■■■ + qQi > i for 1 < i < k), 

where Qi,...,Qk are independent and Poisson with mean l3/q. The fact 
that q~^N and q~^Ni, rather than N and Ni, are independent and Poisson, 
follows using part (ii) of Theorem 3.4 and the fact that the probability that a 
cluster overlaps the end of the interval 1,2, ... ,1^ converges to zero as ^ 00. 

To appreciate intuitively why, in the paragraph above, the Poisson mean 
equals 13 /q rather than (3, note that (2.2) and (2.5) imply that l'E{N) — > /3 
and vE[Ni) (3 as v ^ 00. However, each time an exceedence occurs it is, 
with probability converging to 1, accompanied hy q — 1 other exceedences, 
and so if the number of clusters has mean f3i, then f3iq = (3, that is, f3i = f3/q. 

3.5. Clustering in the case of Pareto-type distributions of disturbances. 
Here we assume that, for constants C,p> 0, 

(3.9) P{e > x) ~ Cx-P 

as 2; ^ 00. More generally, C could be replaced by a slowly varying function 
of X. In these settings the probability that a cluster of exceedences occurs 
is bounded away from zero, as the exceedence level increases, regardless of 
ties among the moving-average weights. 

To describe the distribution of cluster size, let > • • • > ^(m) denote a 
ranking of the m nonzero ^j's. Define 0(g) = for q > m and Pq = (0f ^ — 
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Let Mq denote a random variable for which P{Mq = q) = Pq- 
Note that if all the nonzero 0j's are equal, then P{Mq = m) = 1. Our next 
theorem asserts that the distribution of Mq is the limiting distribution of 
cluster size. Given x > 0, write M for the number of values j with \j\ < r, 
for which Xj > x, and define Mi to have the distribution of M given that 
M> 1. 

Theorem 3.5. // (3.9) holds, and if the weights 0^ are all nonnegative 
and satisfy (3.3), then P{Mi =q) ^ P{Mo = q) as x ^oo. 

Theorem 3.5 implies that both (2.4) and (2.7) fail in the forms given there. 
We now outline modifications to (2.4) and (2.7) that are necessary if those 
results are to hold in the setting of (3.9). 

Put fi = E{Mq), let Q and Qi,Q2,--- be independent and identically 
Poisson-distributed random variables with mean and let Mi,M2, . . . 

and Mje, for j >1 and i>l, be independent random variables each with 
the distribution of Mq. In cases where (3.9) holds, (2.4) and (2.7) should be 
replaced by, respectively, 

(3.10) Po{N>k) 



(3.11) Pq{N^''> >iior l<i<k) ^ Pij2Y.^ji^ >i for l<i<k\. 

\j=ie=i / 

In principle the Pareto parameter p, and the constants 9k in the linear- 
process model, also can be estimated from data, and hence the distribution 
of Mq can be estimated. This leads to estimators of the right-hand sides of 
(3.10) and (3.11). However, this approach to statistical analysis will gener- 
ally not be straightforward. 

3.6. The case of Student's t-statistic. The model (3.2) for Xi is directly 
appropriate when the test statistic is a sample mean, but in other cases it 
is only an approximation. For example, in the context of two-channel mi- 
croarrays, Xi would be a Studentized mean. In this setting, suppose data 
Vii,...,Vin are generated as at (3.4), and consider the test that rejects 
Hqi : /ij = 0, in favor of Hu : > 0, if Yi > t, where 

(3.12) Y,- r^-y^^l^J<nV^, 




{n-'j:i<j<nKj - {n-^El<j<nV^JyV^' 

is a conventional t-statistic. If n is large, then the distribution of Yi under HQi 
can be approximated by the distribution of Xi, at (3.2), on taking Si to be 
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given by (3.5). Moreover, as n increases the distribution of 1^ becomes more 
light-tailed, and so high-level exceedences by the y^'s should become less 
clustered. Perhaps surprisingly, "large" n can be very much less than i/ [it 
is sufficient that logi/ = o(n) as n diverges], and the tails of the distribution 
of e can be relatively heavy (only E\e\^ < oo is required), without damaging 
the property that high-level crossings are asymptotically independent. Also, 
depending on the weights 9^, the level, t, at which these properties occur 
can be substantially lower than in the setting of Theorems 3.1-3.3. These 
results make substantial use of special properties of t-statistics, and will be 
given elsewhere. 

3.7. The case of a highly correlated Gaussian process. The reader will 
have noticed that the strength of dependence permitted by the model (3.2) 
is reasonably low, and might well ask: "Just how strong does dependence 
have to be before clustering becomes apparent?" Our purpose in Section 
3.7 is to respond to that question. In the context of processes for which 
dependence decays to zero over a finite range, the answer is, "The point at 
which clustering is noticed is where the correlation between nearby Xj's is 
1 — const. (log z/)~^/^ -|- o{(log z^)""*^/^}." This is not especially strong correla- 
tion; for each > it is weaker than 1 — const. 

There exist real-world processes where dependence at neighboring indices 
i can be very strong. Consider, for example, the case of speckle imaging 
in astronomy, where noise correlation at neighboring pixels can be particu- 
larly high. This has a significant effect on the potential for resolving (or for 
successfully testing for the existence of) faint light sources in the heavens. 

To model these processes we shall take the variables £i , in the moving av- 
erage at (3.2), to be independent A'^(0, 1) random variables, and the weights 
9k to be given by 

k 

(3.13) 9_k = cY[pk for k> 0,9^ = for k>l, 

3=0 

where the constants are nonnegative, and c > is chosen so that varXi = 
1 for each i. If each = p, not depending on k, then Xi is an autoregression 
of order 1: Xi = pXi + (1 — p^)^/^^^. We shall instead take 

po = 1, Pk = f- a-kS + o{5) for l<k<r, 

(3.14) 

Pk = for k>r + 1, 

where 5 = 6{i') J, as ^ oo, and ai, . . . ,ar are nonnegative constants. 
Define 

1 

(3.15) Cj = — — ^(afc+iH hofc+j). 

+ -"^ k=o 
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Then, cov(Xj, = for j >r + 1, whereas for < j < r, 

(3.16) cov{Xi,Xi_j) = 1 - Cj6 + o{6). 



These properties, and the fact that 6 decreases with increasing z^, imply that 
Xi-^ and Xi^ are very highly correlated if |zi — i2\ <r, but are independent 
otherwise. 

We shall give the limiting distribution of cluster size in this setting. To do 
so, define I to be the set of 2r integers between —r and r, excluding zero; 
and let Zi , for i G T, denote 2r Normally distributed random variables with 
zero means and covariance matrix S = (dij), where 



and Cj is as at (3.15). Write oa to denote either ">" or "<," and let S = 
(cxij -.i £l) be a sequence of such inequalities. Of course, there are just 2^^ 
distinct sequences S. Given a constant d > 0, and given a particular sequence 
5, define 



For < k < 2r, let vr^ equal the sum of vr(5) over all sequences S that 
contain just k ">" signs and 2r — k "<" signs. Define vr^ = 7rfc(z/) to equal 
the probability that exactly k out of the 2r values of Xi, for i £l, exceed t, 
conditional on Xq > t. 

Theorem 3.6. If the errors £i are independent Normal N(0, 1), so that 
the process Xi, defined at (3.2), is Gaussian; if the weights 0^ are given 
by (3.13), and the coefficients pk are given by (3.14), with ai, . . . ,ar >0; if 
ci,...,Cfc are defined in terms of ai, . . . by (3.15); and if t and 6^^ both 
diverge as v ^ oo, with 6^^^t — > d, where < d < oo; then, for < k < 2r, 
"^k^ '^k if < d< oo, TTfc ^ if d= oo, and VTfc — > 1 if d = 0. 

Note that, when X has a normal A^(0, 1) distribution, the value of t de- 
fined by (2.2) satisfies t ~ (21ogi^)^/^ as u increases. Therefore the condi- 
tion invoked in Theorem 3.6, that 6^^'^t — > d for some finite and nonzero 
d, is equivalent to the correlation between neighboring Xj's equalling 1 — 
const, (log i^)~^/^ + o{(logz^)~^/^}. 

4. Numerical properties. Our simulations were based on two different 
models. In model 1 the test statistic Xi was that given at (3.2), with e 
simulated from a Student's t distribution. Model 2 was the Student's t- 
statistic model at (3.12) with n = 10; we took the distribution of e to itself 
be Student's t. In both models the number of nonzero 0fc's (which we shall 



aij = cov{Zi, Zj) = C|j| + c\j\ - c\i_j\ 




P{Zi txij dcu — d z for 1 < |i| < r)e ^ dz. 
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call r) was taken to equal 1 (independence), 3, 10 or 50, and the nonzero 
9k s were taken equal to one another. 

The number, u, of tests was 500, 1000, 2000, 5000 or 10,000 for both 
models. A range of tail weights was achieved by varying the number of 
degrees of freedom for the distribution of e; we included infinity, thereby 
addressing the case of normally distributed e. These were scaled so that 
var(Xi) = 1 in each case. The chosen critical values were based on controlling 
the FWER in the one-sided case with a = 0.05. Each simulation involved 
10,000 repetitions. 

"Clustering tendency" can be characterized in terms of the value of N, 
that is, the number of rejected hypotheses. If the hypothesis tests are gen- 
uinely independent, then most realizations have equal to or 1; the 
proportion of realizations for which A > 1 is only 0.0013. However, as the 
effects of dependence become more pronounced, leading to greater cluster- 
ing, the event N > 1 becomes more common, with a corresponding decrease 
in the number of events for which N = 1. Therefore a succinct way of re- 
porting the effect that tail-weight of the error distribution has on clustering 
tendency is to graph the proportion of clusters for which > 1 of those for 
which A > 0, against number of degrees of freedom (df). 

This is the approach taken in Figures 1 and 2, which summarize these 
results. In both figures, panels (a) through (d) represent the different values 
of r (1, 3, 10 or 50, resp.). The horizontal axis gives the number of degrees 
of freedom, and each separate line represents a different number of tests, u. 

As Figure 1 indicates, in the case of model 1 there is a clear decrease 
in clustering as tail-weight decreases for r = 3, 10 and 50. This reflects the 
results in Theorem 3.3, for example. There is also a slightly less clear, but 
nevertheless present, decrease in clustering as u increases, particularly for 
normally distributed e. While these trends are present for all values of r, 
by the time r is as large as 10 the strength of dependence has increased so 
much that the decrease in clustering with decreasing tail- weight is noticeably 
slower. See, for example, the panels of Figure 1 corresponding to r = 10, 50. 

Reflecting the conclusions reached in Section 3.6, Figure 2 indicates that 
there is very little clustering under model 2 for r = 3, even for heavy-tailed 
e. There is still clustering for long-range dependency, which persists in the 
light-tailed case, although it decreases as v increases. 

The case of nonequal 0^ was considered; the cases with larger r behaved 
like those with smaller r if the number of large 9k^s was small. 

5. Technical arguments. 

5.1. Proof of Theorem 3.1. To derive (3.1) it suffices to show that, for 
each j for which K,j is a proper subset of Kq, P{Xj > x \ Xq > x) ^ 0. To 
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Fig. 1. Clustering when test statistics are distributed as moving averages, (a) One nonzero value of 6k; (b) three nonzero values of 6k; 
(c) ten nonzero values of 6k; (d) fifty nonzero values of 6k (for clarity, the horizontal axis is logarithmic). 
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values of 6k; (c) ten nonzero values of Ok; (d) fifty nonzero values of 6k (for clarity, the horizontal axis is logarithmic). 
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this end, put 



keKjnKo fce/Cj-n/Co 



where /Co denotes the complement of /Co- Then U is independent of both V 
and Xq, and so 



The ratio P{u + V > x)/P{Xq > x) has the same form as the ratio of prob- 
abihties in (3.6), with {u,v) there replaced here by {x,u). Hence, by (3.6), 
the far right-hand side of (5.1) converges to P{U > u) as x — oo. Since this 
is true for arbitrarily large u, then the far left-hand side of (5.1) converges 
to zero as x — > oo. This proves (3.1). 

5.2. Proof of Theorem 3.2. Let the integer I be so large that, for some 
j, the only O^s for which Ok^O are included in the set . . . ,0j-\-i', and 
let m = m{u) > 1 denote an integer satisfying m'^ u as u ^ oo. Divide the 
indices 1, . . . ,m into B blocks, each of length b = 6(z^), where 6 ^ oo and 
fe/z^ — >0 as I' increases; with consecutive blocks separated by "spacers" of 
length i; in such a way that Xi, . . . , Xb and Xm-b+i , denote the first 
and last block, respectively. (This neat fit of the blocks, and their separat- 
ing spacers, into the interval [l,?Ti] may require a slight increase in m, but 
since b/v 0, then the fit may be achieved without damaging the prop- 
erty m X z^.) Define Jj = 1 (resp., Kj = 1) if > t for some integer i in 
the jth block (in the jih spacer), and put Jj = {Kj = 0) otherwise. Then 
Ji, J2, . . . are independent random variables [call this property (Pi)], as too 
are Ki,K2,... . Let J{b) denote the set of indices j such that IC^^^ is a 
proper subset of /Co and \j\ <b. If 6 diverges to infinity sufficiently slowly, 
then, by (3.1), and as v ^ 00, 



Let Mj and Mk denote the number of nonzero J/s, and number of 
nonzero Kj^s, respectively. Markov's inequality, (2.2), and the fact that 
6 — > 00 as — s- 00, can be used to show that as z/ increases, P{Mk = 0) ^ 1 
[call this property (P2)] and 



(5.1) 



P{Xj >x\Xq>x) 

= P{U + V>x\Xq>x) 

<P{u + V >x\Xq>x) + P{U > u) 




(5.2) 



P{Xj > t for some j G J{b) \Xo>t}^0. 



(5.3) 



fc— >oo I/— >oo 



lim lim sup P(Mj >k) = 0. 
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Result (5.3) implies that the probability that Mj < k can be made arbitrarily 
close to 1, uniformly in v, by choosing k sufficiently large but fixed. This 
property, and (5.2), imply that, with probability converging to 1 as — > oo, 
none of the blocks enjoys more than a single exceedence; call this property 
(P3). Together, (P2) and (P3) imply that, with probability converging to 1 
as — > 00, the number of indices i, for 1 < i < m, such that Xi > t, equals 
the number of indices j, for 1 < j < B, such that Jj = 1. Call this property 
(P4). 

The Poisson property stated in the theorem, but for the interval [0,?7i/i/] 
rather than [0, C], follows from (Pi) and (P4). By taking m = m[u) to be so 
large that m/p > C for all sufficiently large we complete the proof of the 
theorem. This argument does not immediately give the mean of the Poisson 
distribution. However, simple calculations from (2.2) show that P{Jj = 1) = 
b(5v~^ + o{bv~^), not depending on j. Prom this result, and the fact that 
B ^ u/b, follows the claim in the theorem that the limiting Poisson process 
has intensity (3. 

5.3. Proof of Theorem 3.3. First we assume that 7 > 1. Without loss of 
generality, the constant C in (5.3) equals 1. We shall prove that in this case, 
if 9i, . . . ,9r are nonnegative constants, at least one of them positive, and if 
£^1 , . . . , Sf are independent and identically distributed random variables for 
which the density satisfies (3.7), then 

P{x)^p[j20kek>x]=expl-(j2<^''k^^^'^] ^ ^x^ + oix'')]. 

\k=l ) I \fc=l 



(5.4) 

Result (3.6) follows directly. 

Let U. denote the set of points (ui, . . . , u^) such that Q^i^k > 1 and each 
Uk > 0. It can be deduced from (3.7) that, as x —KyD, 



P{x) = ex.p{o{x'^)} / exp{—{u'l + ---+uj)x'^}dui 
Ju 



■ ■ ■ dUr. 



A Lagrange multiplier argument shows that the minimum oi uj + ■ ■ ■ -\- u'J , 

subject to Ylk^kUk > 1 and each > 0, occurs when Uk = Ciel^^^-'^ and 

equals C7"\ where Cf ^ = Efc^fc^^^"^^- Therefore, (5.4) holds. 

Next we treat the case 7 = 1. It suffices to assume that C = 1 and P{e > 
0) = 1. Suppose too that, among the positive weights 61,..., 9r, there are 
just d distinct values of 9k, given by wi > • • • > > 0, and that these are 
repeated si, . . . ,Sd times, respectively. Thus, si + • ■ • + Sd equals the number, 
r, of integers k for which 9k is nonzero. Then, writing du for either dui ■ ■ ■ dur 
or dui - ■ ■ dUfi, depending on occasion, we have 
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'iMlH h6rUr>X, 

Ui,...,Ur>0 



UJlUlA \-LUdUd>X, 

Ml,...,W£j>0 




exp{ — (ui + • • • + Ur)}du 



n ul"^"-'^'^-' exp{-(ni + • • • + Ud)} du 

\k=l J 



*fc(C2 + l)-l 



U-i>{x/u)-l) — {uJ2U2-\ \-UJrUr)/l^l 

ui,...,ua>0 



(5.5) 




X 



Sl(C2+l)-l 




^*fe(C2+l)-l 



0^2 W2H \-UJrU'r^X, 

U2,...,Ud>0 

X eX.p[—{xUJi^ + (1 ~ UJi^UJ2)u2 

H h (1 - uJi^uJd)ud}] du2--- dud 

^a,^i(C2+i)-ig^p(_^/^^)^ 

Result (5.5) gives an asymptotic expression for the denominator in (3.6). 
An asymptotic formula for the probability in the numerator, equal to P2{u — 
v) say, can be derived similarly. To appreciate the conclusion of those calcu- 
lations, let /C(j) denote the set of indices k in ICj for which 6^ = toi, and put 
IC[j] = IC{j) n /C(0). Then, for j / 0, JC[j] is a proper subset of /C[0] = /C(0). 
If IC[j] is empty, thenp2(a^) = 0{e~^^^) for a constant uj e (0, wi). If /C[j] con- 
tains at most si — 1 (> 1) elements, then P2{x) = 0{x^^^'~"^~^^^~'^ exp(— x/wi)}. 
It follows from these properties and (5.5) that, for each v, P2{u — v) / pi{u) 
as — > oo. Therefore (3.6) holds. This establishes Theorem 3.3 in the case 
7 = 1. 

5.4. Proof of Theorem 3.4- In order to prove (i) it suffices to show that, 
for each j 7^ 0, 

, . P{Xo>x,Xj>x) 
^^•^^ P{Xo > X) 

as X — i- CX3. To achieve this end we shall derive upper and lower bounds for 
the numerator and denominator, respectively, on the left-hand side. 

Let r denote the number of nonzero values of 9k, choose i? > so large 
that a = P(0 < e < B) > 0, write wi and L02 for the largest and second- 
largest, respectively, values of 6^, and put C3 = (r — 1)uj2B/uji. Then, since 
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0<7< 1, 

P{Xo >x)> P{ujie >x-{r- l)cJ2-B}a^-i 

(5.7) = exp{-C{xuj{^ - Cay + o{x^)} 

= exp{-C{x/uJiy + o{x^)}. 

Next we derive an upper bound to the numerator in (5.6). We may assume, 
without loss of generahty, that = 6i£i+i + • • • + Or£i+T, where OiOr 7^ 0. 
We shaU also suppose that r > 2; the case r = 1 is straightforward. Let J 
denote a large positive integer, and given —00 < j <oo, put Xj = {j / J, {j + 
1)/ J]x. Define e'^ = 9k£k and £jk = {e'f, G Ij}, and let ^ G (0, 1) be a constant. 
Suppose that the unique maximum of 9k occurs at k = £. Then, 

P{Xq > X and > (,x for some k G [1, r] with i) 

i:l<i<r, \k=l I 

< E E p({e:>ex}nn 

i:l<i<r, ji,...,jr- V k=l I 

i+l- h+-+3r+r>J 

(5-8) = E E P{e',>ix,£,^,) n n£j,k) 

i:l<i<r, ji,...,jr- k:l<k<r, 
ij^f^ ji+---+jr+r>J k^i 



<exp{o(x'^)} ^ ^ exp[— Cx''' max{(,^/ci;2)'^; 

U^/JO^y}] 



i:l<i<r, ji,---,jr>0 
ii-t- ji+---+jr+r>J 



exp|-Cx^ ^ ^hlJ9k)\ 



ki^i 

li i, then the minimum of J2k: k^ii^k/dkV , subject to J2k'^k = v and 
each Ufc > 0, occurs when U£ = v and = for k ^ i. Hence, given ^ > 0, 
and T] > sufhciently small, we may choose J so large that, uniformly in 
1 <i <r with i, 



E 



jri,...jV>0: 
ji+---+jr+r>J 



-Cx'ymax{{C/u;2V,{j^/J0^r} 



(5.9) -Cx^ ih/JOk) 

k : l<k<r, 
k+i 
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= 0[exp{-(l + r?)C(xM)^}]. 

Combining (5.8) and (5.9) we deduce that, for each ^ e (0,1), there exists 
r] = 7y(^) > for which, as x ^ oo, 

P{Xq > X and Sk > £,x for some k S [1, r] with t) 

(5.10) 

= 0[exp{-(l + r?)C(xM)^}] = 0[exp{-C(x/u;)^}], 
where < cj < . 

Let < ^, ?7 < 1 and define y = x — {r — 2)^ and los = loi + uj2- Then, for 
each i 7^ and for J sufficiently large, the argument leading to (5.9) gives 

P{Xq > x,Xi> x, and < for all k £ [1, r] U [1 — i, r — i] 

except for k = £ or k = £ + i) 

< P{9iee + Oi+iSe+i > y and Oi^iSe + 6i£i+i > y) 



(5.11) 



= 5] exp[-(l - r,)C{2y/u^^V{{h/jr< + (is/J)^] 

\ iiJ2>0: / 

ii+i2+2>J 

= 0[exp{-(l - r/)C(2yM)n] = 0[exp{-C(x/w)n], 

where w can be taken in (0,u;i) if r] is chosen sufficiently small. Combining 
(5.10) and (5.11) we deduce that, for some < tj < wi, 

(5.12) P(Xo >x,Xj >x) = 0[exp{-C(x/cj)^}]. 

Result (5.6), and hence part (i) of Theorem 3.4, follows from (5.7) and (5.12). 

Next we derive part (ii) of Theorem 3.4. Let £ denote one of the q distinct 
values of k for which 9^ = max{0i, . . . , 0^}; write I for the set of indices i 
such that 1 < |i| < r; let T{t) be the set of g — 1 indices i G T which are such 
that Oi-i = 9g\ and let X'(^) be the complement of T{1) in Z. Let Mj, for 
« GX, be a sequence composed of the inequalities < or >, as in Section 3.7. 
Then, the probability p{T) that Xi Mj x for each i and that, in addition, 
Xq > j;, is given by 

p{I)= I< ^ 9kUk+i + 9i_iUe>:iiX for i£l{£), 

yi<k<r:k+i^t 

^ 9kUk+i X for i G l'{£) , ^ 9kUk + 9iUi > x \ 

k=l l<k<r:kyt£ ) 



2r 



2r- 



MULTIPLE TESTING 23 

Part (ii) of Theorem 3.4 can be derived by evaluating this integral, changing 
variable appropriately. The argument is outlined below. 

Write u' for the vector with components u_2r, • ■ • ■,U2r-, except that ui is 
excluded. Let C, Ci and C2 be as in (3.8). Then, changing variable from U(, 

to V = 9iiUi/x, we have, as x — > 00, 



p{I) = ^ 1 1\- ^ OkUk+i + u cOj 1 for each i G l{tj, 
^^■^ ^ ^ l<k<r:k+i^e 



X 



1 1 

- ^ cOj 1 for each z g2:'(^), - ^ ekUk + v>\ 

^ k=l ^ l<k<r:kj^e 

J I{v cxij 1 for each i G T(^), txi^ 1 for each i G l'{l),v > 1} 



n/K) |/(^) du'dv + o[x^'+'-^exp{-C{x/9er}] 



X 

= — /{ixij equals > for each i gX(^), ocij equals < for each i 
9i 

xj^^f (^^) dv + o[x^2+i-^ exp{-C(x/0,)n] 

= /{cOj equals > for each i gT(^), Odj equals < for each i ^ !'{£)} 

X C-^Ci{x/9e)x^^^^-^exp{-C{x/eey} 

+ o[x^'+^~"' exp{-C{x/9ef}]. 

A similar but simpler argument shows that, as x — > 00, 

P{Xo>x)^C'^Ci{x/9i)x^^+^-^exp{-C{x/9e)^}. 

Therefore, p{I) ~ ^'(^o > x) if "Mj equals > for each i G and Mj equals 
< for each i G ; while = o{P(Xo > x)} if the property in quotation 

marks fails. This result implies that P{M = q,XQ > x) ~ P{Xq > x), which 
is equivalent to part (ii) of Theorem 3.4. 

5.5. Proof of Theorem 3. 5. Without loss of generality, Xi = ^lEj+i H h 

9r£i+r for each i, where 9i9r 7^ 0. Let ci > be fixed but arbitrarily large, and 
define Ii = (—00, — ci], I2 = (— ci, ci], X3 = (ci, x/r], and T4 = (x/r, 00). Put 
s'k = ^fc^fc and £jk = {e'k € X,}, for j = 1, . . . , 4. If none of e'^, . . . , is in J4, 
then < X. Moreover, the probability, p(k,x) say, that just fc of e'^, ...,£[. 
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are in Z4, satisfies p{k,x) x x~^'' as x — > 00. Therefore, if we define Sq = 
{Xq > x}, £"4 = {exactly one of £"41, . . . , S^r holds} and £5 = SqCiS^, then, as 
X — > 00, 

(5.13) P{£o\£5) = 0{x~^P). 

Put £"61 = £ii U £3i and £q = {at least one of £-61, ■ • ■ , £6r holds}. Then, 
P(£5 n fe) < E E ^(^4n n£ei,)<J2J2 ^(^n^ > x/r)P{ei, \e\ > ci) 

< Bix-PP{\e\ > ci min6l-^), 
where Bi> does not depend on ci. Therefore, 

lim limsupx^P(£'5 n £&) = 0. 

Cl >00 ^ — 

Combining this result and (5.13), and defining £7 = £q n £i (1 £q, where £q 
denotes the complement of £q, we have, 

(5.14) lim limsupa;''P(£:o \f7) = 0. 

Cl >00 3. ^QQ 

Let C2 denote any fixed real number, and define £si = {e'j € (— ci,ci] for 
each j € [l,r] for which j 7^ i}, and 

r 

£9 = £9(01, C2,x) = |J{e- > X + C2} nfsj. 
1=1 

Since 

£^7 = {Xq > x} n {exactly one of £"41, . . . ,£"4^ holds} 
n {none of £qi, . . . , £^6r holds}, 

then 

r r 

(5.15) \J{e',>x + {r- l)ci} n £:8i C £:7 C |J {^ > x - (r - l)ci} n £si. 

i=l 1=1 

However, for each z G [l,r] and each C3 > 0, 

(5.16) Pix-C3<ei<x + C3) = o{x-P) 

as X ^ 00. Writing A for the symmetric-difference binary operator, and 
combining (5.14)-(5.16), we deduce that 

(5.17) lim limsupx''P(£:oA£'9) = 0. 

Let i2j(ci,x), for j > 1, denote a function of ci and x satisfying 
lim lim sup x^Rj (ci , x) = 0, 

Cl— >00 J. — ^QO 
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let 0(1) > • • • > 0(m) denote a ranking of the m nonzero Sj's, define 0(g) = 
for q> m and put 

r 
i=l 

n {— ci < OkEk+j < ci for each A; G [l,r] for which /c 7^ i}]. 
Then £{d) =f9(ci,0,x), and so, by (5.17), 
P{M = q) 

= P{Xj > X for exactly q values of j satisfying \j\ < r) 
= P{£{j) holds for exactly q values of j satisfying \j\ < r} 
(5.18) +Ri{ci,x) 

m 

= ^ P{e > 6^}jX for exactly q values of j satisfying |j| < r} 

i=l 

+ i?2(ci,x) 
= Cx-M^f^)-^f,+l)) + ^3(ci,x). 

Result (5.18) implies that P{Mi = q) ^ Pq, which is identical to P^Mq = 
q), completing the proof of Theorem 3.5. 

5.6. Proof of Theorem 3. 6. Let Ui = {X^r , . . . , X_i , Xi , . . . , X^)"^ and 
U2 = Xq, and define U = {Ui ,U2)^ , a (2r + l)-vector. Partition the covari- 
ance matrix, S, of J7 in the ratio 2r : 1, meaning that the top left-hand corner 
matrix, Sn say, is 2r x 2r, the upper right-hand and lower left-hand ma- 
trices, S12 and S21, are r x 1 and 1 x r, and the lower right-hand corner 
matrix is 1 x 1 and equals 1. In this notation, C/i, conditional on U2 = u, is 
Normal N{Y^i2U, Sn - i;i2i;2i)- In view of (3.16), (Si2)ii = 1 - c\i\6 + o{6), 
(Sii)ij = 1 — C|j_j|(5 + o{6) and 

(Si2S2i)i,- = {1 - c\i\6 + o{6)}{l - c\^\5 + o{6)} 
= l-6{cm + c\j\) + o{S), 

and so (Sn — Si2S2i)jj = ^iS + o{6) , where (Si)jj = C|j| +C|j| — C|j_j|. There- 
fore, conditional on U2 = u, d~^^'^{Ui — T,i2u) is Normal N{0,T,2), where 
S2 = Si + 0(1) and does not depend on u. Hence, taking Z = {Z^r, ■ ■ ■ , Z^i, 
Zi, . . . , Zr) to be Normal A^(0, S2), we have: 

P{Xi t for 1 < |i| < r\Xo > t) 

/oo 
P{Xi cxjj t for 1 < |i| < r\Xo = u) duP{Xo <u\Xo>t) 
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P{Xi \xii t for 1 < \i\ <r\XQ=u)eyi^{-\{v? -t^)}du + o{l) 

/oo 
P{Xi Mi t for 1 < \i\ < r|Xo = t + vt"'^) 

X exp(-t> - ^v^t'"^) dv + o(l) 

roo 

= P[{l-c\i\6 + o{5)}{t + vt-^) 

J 

+ {1 + o{l)}5^/'^Zi ixii t for 1 < |i| < r]e~" dv + o(l) 

roo 

= / P{Zi ooi J^/^tcLi - 5~^''^r^v for 1 < Kl < r)e"'' (iw + o(l) 
= 7r(5) + o(l). 

Adding over sequences 5 that include just k ">" signs, we deduce that 
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